Semantic understanding of visual scenes is one of the holy grails of computer vision. Despite efforts of the community in data collection, there are still few image datasets covering a wide range of scenes and object categories with pixel-wise annotations for scene understanding. In this work, we present a densely annotated dataset ADE20K, which spans diverse annotations of scenes, objects, parts of objects, and in some cases even parts of parts. Totally there are 25k images of the complex everyday scenes containing a variety of objects in their natural spatial context. On average there are 19.5 instances and 10.5 object classes per image. Based on ADE20K, we construct benchmarks for scene parsing and instance segmentation. We provide baseline performances on both of the benchmarks and re-implement the state-ofthe-art models for open source. We further evaluate the effect of synchronized batch normalization and find that a reasonably large batch size is crucial for the semantic segmentation performance. We show that the networks trained on ADE20K are able to segment a wide variety of scenes and objects 1 .
translated by 谷歌翻译
Graph Neural Networks (GNNs) have shown great potential in the field of graph representation learning. Standard GNNs define a local message-passing mechanism which propagates information over the whole graph domain by stacking multiple layers. This paradigm suffers from two major limitations, over-squashing and poor long-range dependencies, that can be solved using global attention but significantly increases the computational cost to quadratic complexity. In this work, we propose an alternative approach to overcome these structural limitations by leveraging the ViT/MLP-Mixer architectures introduced in computer vision. We introduce a new class of GNNs, called Graph MLP-Mixer, that holds three key properties. First, they capture long-range dependency and mitigate the issue of over-squashing as demonstrated on the Long Range Graph Benchmark (LRGB) and the TreeNeighbourMatch datasets. Second, they offer better speed and memory efficiency with a complexity linear to the number of nodes and edges, surpassing the related Graph Transformer and expressive GNN models. Third, they show high expressivity in terms of graph isomorphism as they can distinguish at least 3-WL non-isomorphic graphs. We test our architecture on 4 simulated datasets and 7 real-world benchmarks, and show highly competitive results on all of them.
translated by 谷歌翻译
This paper is devoted to the numerical resolution of McKean-Vlasov control problems via the class of mean-field neural networks introduced in our companion paper [25] in order to learn the solution on the Wasserstein space. We propose several algorithms either based on dynamic programming with control learning by policy or value iteration, or backward SDE from stochastic maximum principle with global or local loss functions. Extensive numerical results on different examples are presented to illustrate the accuracy of each of our eight algorithms. We discuss and compare the pros and cons of all the tested methods.
translated by 谷歌翻译
This paper describes the system developed at the Universitat Polit\`ecnica de Catalunya for the Workshop on Machine Translation 2022 Sign Language Translation Task, in particular, for the sign-to-text direction. We use a Transformer model implemented with the Fairseq modeling toolkit. We have experimented with the vocabulary size, data augmentation techniques and pretraining the model with the PHOENIX-14T dataset. Our system obtains 0.50 BLEU score for the test set, improving the organizers' baseline by 0.38 BLEU. We remark the poor results for both the baseline and our system, and thus, the unreliability of our findings.
translated by 谷歌翻译
This paper studies the infinite-width limit of deep linear neural networks initialized with random parameters. We obtain that, when the number of neurons diverges, the training dynamics converge (in a precise sense) to the dynamics obtained from a gradient descent on an infinitely wide deterministic linear neural network. Moreover, even if the weights remain random, we get their precise law along the training dynamics, and prove a quantitative convergence result of the linear predictor in terms of the number of neurons. We finally study the continuous-time limit obtained for infinitely wide linear neural networks and show that the linear predictors of the neural network converge at an exponential rate to the minimal $\ell_2$-norm minimizer of the risk.
translated by 谷歌翻译
Recent advances in deep learning models for sequence classification have greatly improved their classification accuracy, specially when large training sets are available. However, several works have suggested that under some settings the predictions made by these models are poorly calibrated. In this work we study binary sequence classification problems and we look at model calibration from a different perspective by asking the question: Are deep learning models capable of learning the underlying target class distribution? We focus on sparse sequence classification, that is problems in which the target class is rare and compare three deep learning sequence classification models. We develop an evaluation that measures how well a classifier is learning the target class distribution. In addition, our evaluation disentangles good performance achieved by mere compression of the training sequences versus performance achieved by proper model generalization. Our results suggest that in this binary setting the deep-learning models are indeed able to learn the underlying class distribution in a non-trivial manner, i.e. by proper generalization beyond data compression.
translated by 谷歌翻译
The combination of machine learning models with physical models is a recent research path to learn robust data representations. In this paper, we introduce p$^3$VAE, a generative model that integrates a perfect physical model which partially explains the true underlying factors of variation in the data. To fully leverage our hybrid design, we propose a semi-supervised optimization procedure and an inference scheme that comes along meaningful uncertainty estimates. We apply p$^3$VAE to the semantic segmentation of high-resolution hyperspectral remote sensing images. Our experiments on a simulated data set demonstrated the benefits of our hybrid model against conventional machine learning models in terms of extrapolation capabilities and interpretability. In particular, we show that p$^3$VAE naturally has high disentanglement capabilities. Our code and data have been made publicly available at https://github.com/Romain3Ch216/p3VAE.
translated by 谷歌翻译
在过去的几年中,神经网络(NN)从实验室环境中发展为许多现实世界中的最新问题。结果表明,NN模型(即它们的重量和偏见)在训练过程中的重量空间中的独特轨迹上演变。随后,这种神经网络模型(称为模型动物园)的人群将在体重空间中形成结构。我们认为,这些结构的几何形状,曲率和平滑度包含有关训练状态的信息,并且可以揭示单个模型的潜在特性。使用这种模型动物园,可以研究(i)模型分析的新方法,(ii)发现未知的学习动力学,(iii)学习此类人群的丰富表示形式,或(iv)利用模型动物园来用于NN权重和NN权重的生成模型偏见。不幸的是,缺乏标准化模型动物园和可用的基准可以显着增加摩擦,以进一步研究NNS人群。通过这项工作,我们发布了一个新颖的模型动物园数据集,其中包含系统生成和多样化的NN模型种群,以进行进一步研究。总共提出的模型动物园数据集基于八个图像数据集,由27个模型动物园组成,该模型动物园训练有不同的超参数组合,包括50'360唯一的NN型号以及其稀疏双胞胎,导致超过3'844'360收集的型号。 。此外,对于模型动物园数据,我们提供了对动物园的深入分析,并为多个下游任务提供了基准。该数据集可在www.modelzoos.cc上找到。
translated by 谷歌翻译
给定模型动物园的神经网络权重的学习表示是一个新兴而具有挑战性的领域,从模型检查到神经体系结构搜索或知识蒸馏,具有许多潜在的应用。最近,在模型动物园进行训练的自动编码器能够学习一个超代理,该代表体捕获了动物园中模型的内在和外在特性。在这项工作中,我们扩展了超代表,以供生成使用以采样新的模型权重。我们提出的是层损失归一化,我们证明,这是基于超代表拓扑生成高性能模型和几种采样方法的关键。使用我们的方法生成的模型是多种多样的,性能的,并且能够超过强大的基准,从而在下游任务上进行了评估:初始化,合奏采样和传递学习。我们的结果表明,通过超代理通过过度代理,知识聚集从模型动物园到新模型的潜力,从而为新的研究方向铺平了途径。
translated by 谷歌翻译
在本文中,我们介绍了一种新的离线方法,以使用演示(LFD)范式学习,在考虑用户对任务的直觉的同时,使用示范(LFD)范式学习,实现稳定性和性能约束,以找到可变阻抗控制的合适参数。考虑到从人类示范获得的合规性概况,给出了VIC的线性参数变化(LPV),它允许陈述设计问题,包括稳定性和性能约束为线性矩阵不平等(LMIS)。因此,使用解决方案搜索方法,我们根据用户偏好在任务行为上找到最佳解决方案。通过比较获得的控制器的执行与在二维轨迹跟踪任务中不同用户首选项集的设计的解决方案来验证设计问题。将滑轮循环任务作为案例研究提出,以评估可变阻抗控制器的性能,并使用用户偏好机制对恒定的稳定性控制器进行恒定的敏捷性和倾斜度。所有实验均使用7-DOF Kinova Gen3操纵器进行。
translated by 谷歌翻译